uncertainty parameter
- North America > United States > California > Los Angeles County > Long Beach (0.14)
- Europe > Switzerland > Zürich > Zürich (0.14)
- Asia > Japan > Honshū > Kantō > Ibaraki Prefecture > Tsukuba (0.05)
- (22 more...)
- Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
- Information Technology > Artificial Intelligence > Robots (0.94)
- Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.48)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.46)
Max-Min Off-Policy Actor-Critic Method Focusing on Worst-Case Robustness to Model Misspecification
In the field of reinforcement learning, because of the high cost and risk of policy training in the real world, policies are trained in a simulation environment and transferred to the corresponding real-world environment.However, the simulation environment does not perfectly mimic the real-world environment, lead to model misspecification. Multiple studies report significant deterioration of policy performance in a real-world environment.In this study, we focus on scenarios involving a simulation environment with uncertainty parameters and the set of their possible values, called the uncertainty parameter set. The aim is to optimize the worst-case performance on the uncertainty parameter set to guarantee the performance in the corresponding real-world environment.To obtain a policy for the optimization, we propose an off-policy actor-critic approach called the Max-Min Twin Delayed Deep Deterministic Policy Gradient algorithm (M2TD3), which solves a max-min optimization problem using a simultaneous gradient ascent descent approach.Experiments in multi-joint dynamics with contact (MuJoCo) environments show that the proposed method exhibited a worst-case performance superior to several baseline approaches.
- North America > United States > California > Los Angeles County > Long Beach (0.14)
- Europe > Switzerland > Zürich > Zürich (0.14)
- Asia > Japan > Honshū > Kantō > Ibaraki Prefecture > Tsukuba (0.05)
- (22 more...)
- Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
- Information Technology > Artificial Intelligence > Robots (0.94)
- Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.48)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.46)
- North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)
- Europe > Italy > Calabria > Catanzaro Province > Catanzaro (0.04)
- Asia > Japan > Honshū > Chūbu > Ishikawa Prefecture > Kanazawa (0.04)
Max-Min Off-Policy Actor-Critic Method Focusing on Worst-Case Robustness to Model Misspecification
In the field of reinforcement learning, because of the high cost and risk of policy training in the real world, policies are trained in a simulation environment and transferred to the corresponding real-world environment.However, the simulation environment does not perfectly mimic the real-world environment, lead to model misspecification. Multiple studies report significant deterioration of policy performance in a real-world environment.In this study, we focus on scenarios involving a simulation environment with uncertainty parameters and the set of their possible values, called the uncertainty parameter set. The aim is to optimize the worst-case performance on the uncertainty parameter set to guarantee the performance in the corresponding real-world environment.To obtain a policy for the optimization, we propose an off-policy actor-critic approach called the Max-Min Twin Delayed Deep Deterministic Policy Gradient algorithm (M2TD3), which solves a max-min optimization problem using a simultaneous gradient ascent descent approach.Experiments in multi-joint dynamics with contact (MuJoCo) environments show that the proposed method exhibited a worst-case performance superior to several baseline approaches.
Learning Uncertainty-Aware Temporally-Extended Actions
Lee, Joongkyu, Park, Seung Joon, Tang, Yunhao, Oh, Min-hwan
In reinforcement learning, temporal abstraction in the action space, exemplified by action repetition, is a technique to facilitate policy learning through extended actions. However, a primary limitation in previous studies of action repetition is its potential to degrade performance, particularly when sub-optimal actions are repeated. This issue often negates the advantages of action repetition. To address this, we propose a novel algorithm named Uncertainty-aware Temporal Extension (UTE). UTE employs ensemble methods to accurately measure uncertainty during action extension. This feature allows policies to strategically choose between emphasizing exploration or adopting an uncertainty-averse approach, tailored to their specific needs. We demonstrate the effectiveness of UTE through experiments in Gridworld and Atari 2600 environments. Our findings show that UTE outperforms existing action repetition algorithms, effectively mitigating their inherent limitations and significantly enhancing policy learning efficiency.
- Asia > South Korea > Seoul > Seoul (0.04)
- North America > United States > Massachusetts > Hampshire County > Amherst (0.04)
- Asia > Middle East > Jordan (0.04)
- Asia > Japan > Honshū > Chūbu > Toyama Prefecture > Toyama (0.04)
Max-Min Off-Policy Actor-Critic Method Focusing on Worst-Case Robustness to Model Misspecification
Tanabe, Takumi, Sato, Rei, Fukuchi, Kazuto, Sakuma, Jun, Akimoto, Youhei
In the field of reinforcement learning, because of the high cost and risk of policy training in the real world, policies are trained in a simulation environment and transferred to the corresponding real-world environment. However, the simulation environment does not perfectly mimic the real-world environment, lead to model misspecification. Multiple studies report significant deterioration of policy performance in a real-world environment. In this study, we focus on scenarios involving a simulation environment with uncertainty parameters and the set of their possible values, called the uncertainty parameter set. The aim is to optimize the worst-case performance on the uncertainty parameter set to guarantee the performance in the corresponding real-world environment. To obtain a policy for the optimization, we propose an off-policy actor-critic approach called the Max-Min Twin Delayed Deep Deterministic Policy Gradient algorithm (M2TD3), which solves a max-min optimization problem using a simultaneous gradient ascent descent approach. Experiments in multi-joint dynamics with contact (MuJoCo) environments show that the proposed method exhibited a worst-case performance superior to several baseline approaches.
- North America > United States > California > Los Angeles County > Long Beach (0.14)
- Europe > Switzerland > Zürich > Zürich (0.14)
- Asia > Japan > Honshū > Kantō > Ibaraki Prefecture > Tsukuba (0.05)
- (22 more...)
Efficient Characterization of Dynamic Response Variation Using Multi-Fidelity Data Fusion through Composite Neural Network
Uncertainties in a structure is inevitable, which generally lead to variation in dynamic response predictions. For a complex structure, brute force Monte Carlo simulation for response variation analysis is infeasible since one single run may already be computationally costly. Data driven meta-modeling approaches have thus been explored to facilitate efficient emulation and statistical inference. The performance of a meta-model hinges upon both the quality and quantity of training dataset. In actual practice, however, high-fidelity data acquired from high-dimensional finite element simulation or experiment are generally scarce, which poses significant challenge to meta-model establishment. In this research, we take advantage of the multi-level response prediction opportunity in structural dynamic analysis, i.e., acquiring rapidly a large amount of low-fidelity data from reduced-order modeling, and acquiring accurately a small amount of high-fidelity data from full-scale finite element analysis. Specifically, we formulate a composite neural network fusion approach that can fully utilize the multi-level, heterogeneous datasets obtained. It implicitly identifies the correlation of the low- and high-fidelity datasets, which yields improved accuracy when compared with the state-of-the-art. Comprehensive investigations using frequency response variation characterization as case example are carried out to demonstrate the performance.